Abstract:The personalization of Stable Diffusion for generating professional portraits from amateur photographs is a burgeoning area, with applications in various downstream contexts. This paper investigates the impact of augmentations on improving facial resemblance when using two prominent personalization techniques: DreamBooth and InstantID. Through a series of experiments with diverse subject datasets, we assessed the effectiveness of various augmentation strategies on the generated headshots' fidelity to the original subject. We introduce FaceDistance, a wrapper around FaceNet, to rank the generations based on facial similarity, which aided in our assessment. Ultimately, this research provides insights into the role of augmentations in enhancing facial resemblance in SDXL-generated portraits, informing strategies for their effective deployment in downstream applications.
Abstract:The 3rd Workshop on Maritime Computer Vision (MaCVi) 2025 addresses maritime computer vision for Unmanned Surface Vehicles (USV) and underwater. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 700 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi25.
Abstract:Unmanned surface vehicles (USVs) and boats are increasingly important in maritime operations, yet their deployment is limited due to costly sensors and complexity. LiDAR, radar, and depth cameras are either costly, yield sparse point clouds or are noisy, and require extensive calibration. Here, we introduce a novel approach for approximate distance estimation in USVs using supervised object detection. We collected a dataset comprising images with manually annotated bounding boxes and corresponding distance measurements. Leveraging this data, we propose a specialized branch of an object detection model, not only to detect objects but also to predict their distances from the USV. This method offers a cost-efficient and intuitive alternative to conventional distance measurement techniques, aligning more closely with human estimation capabilities. We demonstrate its application in a marine assistance system that alerts operators to nearby objects such as boats, buoys, or other waterborne hazards.
Abstract:In this paper, we explore the application of Unmanned Aerial Vehicles (UAVs) in maritime search and rescue (mSAR) missions, focusing on medium-sized fixed-wing drones and quadcopters. We address the challenges and limitations inherent in operating some of the different classes of UAVs, particularly in search operations. Our research includes the development of a comprehensive software framework designed to enhance the efficiency and efficacy of SAR operations. This framework combines preliminary detection onboard UAVs with advanced object detection at ground stations, aiming to reduce visual strain and improve decision-making for operators. It will be made publicly available upon publication. We conduct experiments to evaluate various Region of Interest (RoI) proposal methods, especially by imposing simulated limited bandwidth on them, an important consideration when flying remote or offshore operations. This forces the algorithm to prioritize some predictions over others.
Abstract:The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24.
Abstract:Yaw estimation of boats from the viewpoint of unmanned aerial vehicles (UAVs) and unmanned surface vehicles (USVs) or boats is a crucial task in various applications such as 3D scene rendering, trajectory prediction, and navigation. However, the lack of literature on yaw estimation of objects from the viewpoint of UAVs has motivated us to address this domain. In this paper, we propose a method based on HyperPosePDF for predicting the orientation of boats in the 6D space. For that, we use existing datasets, such as PASCAL3D+ and our own datasets, SeaDronesSee-3D and BOArienT, which we annotated manually. We extend HyperPosePDF to work in video-based scenarios, such that it yields robust orientation predictions across time. Naively applying HyperPosePDF on video data yields single-point predictions, resulting in far-off predictions and often incorrect symmetric orientations due to unseen or visually different data. To alleviate this issue, we propose aggregating the probability distributions of pose predictions, resulting in significantly improved performance, as shown in our experimental evaluation. Our proposed method could significantly benefit downstream tasks in marine robotics.
Abstract:This paper introduces a novel approach to video object detection detection and tracking on Unmanned Aerial Vehicles (UAVs). By incorporating metadata, the proposed approach creates a memory map of object locations in actual world coordinates, providing a more robust and interpretable representation of object locations in both, image space and the real world. We use this representation to boost confidences, resulting in improved performance for several temporal computer vision tasks, such as video object detection, short and long-term single and multi-object tracking, and video anomaly detection. These findings confirm the benefits of metadata in enhancing the capabilities of UAVs in the field of temporal computer vision and pave the way for further advancements in this area.
Abstract:Unmanned aerial vehicles assist in maritime search and rescue missions by flying over large search areas to autonomously search for objects or people. Reliably detecting objects of interest requires fast models to employ on embedded hardware. Moreover, with increasing distance to the ground station only part of the video data can be transmitted. In this work, we consider the problem of finding meaningful region of interest proposals in a video stream on an embedded GPU. Current object or anomaly detectors are not suitable due to their slow speed, especially on limited hardware and for large image resolutions. Lastly, objects of interest, such as pieces of wreckage, are often not known a priori. Therefore, we propose an end-to-end future frame prediction model running in real-time on embedded GPUs to generate region proposals. We analyze its performance on large-scale maritime data sets and demonstrate its benefits over traditional and modern methods.
Abstract:The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.
Abstract:Acquiring data to train deep learning-based object detectors on Unmanned Aerial Vehicles (UAVs) is expensive, time-consuming and may even be prohibited by law in specific environments. On the other hand, synthetic data is fast and cheap to access. In this work, we explore the potential use of synthetic data in object detection from UAVs across various application environments. For that, we extend the open-source framework DeepGTAV to work for UAV scenarios. We capture various large-scale high-resolution synthetic data sets in several domains to demonstrate their use in real-world object detection from UAVs by analyzing multiple training strategies across several models. Furthermore, we analyze several different data generation and sampling parameters to provide actionable engineering advice for further scientific research. The DeepGTAV framework is available at https://git.io/Jyf5j.